*********READ ME************************

THE IMPACT OF SOCIAL NETWORKS ON EITC CLAIMING BEHAVIOR

Published: Review of Economics and Statistics

Author: Riley Wilson

*****************************************
This folder includes the data and code needed used for all analysis in the 
manuscript "The Implact of Social Networks on EITC Claiming Behavior."

The Social Connectedness Index was constructed by Bailey et al., (2018b). This data 
was obtained through a contractual agreement and cannont be shared. However, Baile et al., 
(2018b) give specific direction as to how researchers can access the data. Further 
questions about this data can be directed to the authors.

As such, any variable constructed from the SCI is not available and analysis using these variables
will not run. This includes the csv file County_County.csv and the file sci_distpred.dta, which is used for IV analysis in Appendix Table A15 and A16

ALL CODE WAS WRITTEN AND EXECUTED ON STATA 15


CONTENTS:

.do files:
	sci_exposureprep.do:							this file includes code to prepare the SCI data and append to existing data
	network_eitc_analysis_final.do:					includes all of the analysis in the paper
	
data files:
	cty_eitcnetwork1999_2013_nosci.dta:				main dataset used for most of the analysis
	hh_selfemp_network05_17_nosci.dta: 				dataset constructed from ACS microdata to analyze employment responses in ACS (ex: TAble 4)
	dma_eitcgoogletrend_network04_17_nosci.dta:		dataset used when analysing DMA-level Google Trends
	state_eitcrates1985_2018.dta:					dataset containing state-level EITC parameters for each year (used to create Figure 1)
	state_eitcrateswide1985_2018.dta:				same dataset as above, but in wide format (used to create Figure A1)
	county_pop_centroids2010.dta:					contains county population and population centroids from 2010
	master_cty_distance_long.dta: 					contains county to county distance and used to control for distance in robustness Table A6
	seer_pop90_18.dta:								contains county-level annual population used in Table A6 to control for population.
	cty_dma_xwalk2014.dta: 							contains county to DMA crosswalk used in Table A6 to include DMA by year fixed effects
	migration_rates.dta:							contains county migration rates and used to control for migration in Table A7
	provider_levelcty.dta:							contains county-level count of high speed internet providers used in Table A13 obtained from Dettling et al. (2018)
	cty_interstates.dta: 							contains county-level data on interstate connections
	NCHSURCodes2013.xlsx: 							contains county urbanicity scale to isolate non-metro centers for IV analysis in Table A16.
	eitc_tweets2010_2019.dta: 						dataset used in Appendix B when analysing Twitter trends
	tweet_words.dta: 								dataset used in Appendix B when analysing Twitter word usage
	
	shapefiles->countycoord.dta & countydb.dta:		contains county level shapefiles used in Figure 2 and A8
	shapefiles->statecoord.dta & statedb.dta:		contains state level shapefiles used in Figure A1
	shapefiles->freewaycoord.dta					contains interstate shapefile coordinates used for analysis in Table A15-A16 and Figure A8
	
	prep_files->cty_eitcrange2000.dta: 							dataset needed to create final data if you are merging on the proprietary SCI data
	prep_files->cty_sameinterstate.dta: 						dataset needed to create final data if you are merging on the proprietary SCI data
	prep_files->dma_stfips_xwalk.dta:							dataset needed to create final dma data if you are merging on the proprietary SCI data
	